Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: support XDP metadata #6

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

aspsk
Copy link

@aspsk aspsk commented Nov 8, 2019

An XDP program can use the bpf_xdp_adjust_meta() hook to prepend up to 32 bytes
of meta data before the packet. Support this functionality by copying XDP Meta
to the local memory after a BPF program execution and pushing in back right
before delivering the packet.

Signed-off-by: Anton Protopopov [email protected]

An XDP program can use the bpf_xdp_adjust_meta() hook to prepend up to 32 bytes
of meta data before the packet.  Support this functionality by copying XDP Meta
to the local memory after a BPF program execution and pushing in back right
before delivering the packet.

Signed-off-by: Anton Protopopov <[email protected]>
Copy link

@epeer-juniper epeer-juniper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a quick look at this and it looks like it will work. I would suggest extending the size of the existing metadata region in the packet vector LM rather than defining a new region. The larger region might also mean unrolling the save/restore loops becomes a bit too expensive in terms of code store utilization. Might be worth maintaining the unroll for small amounts of metadata and resorting to looping when there is a large amount of metadata to copy.

Copy link

@epeer-juniper epeer-juniper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the metadata get copied into the packet and how are you communicating the length to the driver?

@aspsk
Copy link
Author

aspsk commented Nov 13, 2019

Hi @epeer-juniper, thanks for the comments!

The length of XDP metadata is always <= 32 bytes and it is a multiple of four (because of this optimization: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/skbuff.h#n3698), thus I've used 1 and three free bytes in a packet vector structure to specify that xdp meta is present and its length, correspondingly (see changes here: https://github.com/Netronome/nic-firmware/pull/6/files#diff-d8ec451c070a9abe40f206ec0f0ce23aL77)

The XDP metadata should immediately precede the packet head. When a BPF program executes, it writes meta data there and stores its length in PV as described above. Then xdp_meta_save() saves it to LM, and just before delivering the packet the xdp_meta_restore() puts it back to the packet memory, right after "netronome metadata":

|netronome meta|XDP meta|packet|

To the kernel the length of the XDP Meta is passed as a new "netronome meta" type in "netronome meta", see this line: https://github.com/Netronome/nic-firmware/pull/6/files#diff-510f9a499850424deb9ad7fc7d93beefR171

Also can you please clarify on where and how you suggest to allocate memory and place XDP meta? To allocate room I am extending the packet vector in LM by 32 bytes, so technically it's not a new region, the PV just grows in size.

#while (LOOP > 1)

t/**/LOOP#:
mem[read32, $tr, in_pkt_addr_hi, <<8, xdp_meta_base, 1], ctx_swap[sig_rd]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing we do not have enough xfer registers to indirect_ref xdp_meta_base read this in a block?
This must be the case since this macro is called from a block of code that calls pv_invalidate_cache

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the case. What is the right way to fix this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can to re-use $__pv_pkt_data[32]. This is used to cache packet header data. When you use pv_save_xdp_meta in ebf_reentry, you can do pv_invalidate_cache(in_vec) afterward. On the next pv_seek the packet header will be re-read.

Comment on lines -77 to +79
* 11 | Sequence Number | --- | Seq Ctx | Protocol | 3
* 11 | Sequence Number | XML | Seq Ctx | Protocol | 3
* +-------------------------------+-+-+-+---------+---+-+-+-+-+-+-+
* 12 | TX Host Flags |M|B|Seek (64B algn)|-|Q|I|i|C|c| 4
* 12 | TX Host Flags |M|B|Seek (64B algn)|X|Q|I|i|C|c| 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes will clash with other work Netronome has incoming. Specifically, we want to grow the Seq Ctx field by one bit, and we have a different use for the X bit. We might have to add another word to the PV, which has a knock-on effect everywhere else. @mrapson-netronome and I are discussing this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I was ready to re-implement the layout when pushing the patch. For me ideal would be to have at least 4+ bits to store the length of XDP meta (in words).

XDP meta is now <= 32 bytes and in 4 bytes chunks, so I need at least 4 bits to store the length. Maybe in future it will be extended to 64 or 128 bytes (still in 4 bytes chunks though), so I would like to have 6+ bits, so that we won't have to change firmware/kernel ABI for this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The situation is that we are running out of local memory on the datapath ME's:
This is the layout without your change:

512                 i32.me0._PKT_IO_PKT_VEC
256                 i32.me0.gro_cli_lm_ctx
32                  i32.me0.__actions_sriov_keys
16                  i32.me0.nfd_in_ring_info
16                  i32.me0.nfd_out_ring_info
2048                i32.me0.EBPF_STACK_BASE
Total used 2880 of 4096B

With your change:

1024                i32.me0._PKT_IO_PKT_VEC
256                 i32.me0.gro_cli_lm_ctx
32                  i32.me0.__actions_sriov_keys
16                  i32.me0.nfd_in_ring_info
16                  i32.me0.nfd_out_ring_info
2048                i32.me0.EBPF_STACK_BASE
Total used 3392B of 4096B

You will notice that changing PV_SIZE_LW from 16 to 24 doubles the _PKT_IO_PKT_VEC size. That is because it must be a power-of-2 size. See the alloc logic in pv.uc line 304. The upshot is that there is 8 more words available in the PV that is not shown in the ASCII art. Perhaps you can move your extra fields to this unused section?

If we add to the above the features that Netronome plans to introduce soon:

1024                i32.me0._PKT_IO_PKT_VEC
640                 i32.me0.gro_cli_lm_ctx
16                  i32.me0.nfd_in_ring_info
16                  i32.me0.nfd_out_ring_info
32                  i32.me0._PKT_IO_RX_META
256                 i32.me0.__actions_keys <- had to "break" alignment
2048                i32.me0.EBPF_STACK_BASE
Total used 4032B of 4096B

You can see that we are really out of LM, (and I have to violate the alignment requirement of symbol __actions_keys). Thus there really isn't space to grow the XDP metadata.

If we need to go above 32B, we'll have to consider alternatives, like maintaining the metadata in CLS, or directly in the packet buffer.

Comment on lines +87 to +88
* 16-23 | XDP Metadata | 0-7 <- ACTIVE_LM_ADDR_2 ()
* +---------------------------------------------------------------+
Copy link
Contributor

@JohanM84 JohanM84 Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to lock down LM addr2 for this? Are we sure it's not used elsewhere?

Edit: I think you have to update pv_push and pv_pop as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am allocating data by extending PV_SIZE_LW to 24 words. Shouldn't pv_push and pv_pop work as is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe... It certainly looks like it from inspecting the code. (Note to self, write a unit test to confirm this)

Copy link
Contributor

@JohanM84 JohanM84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aspsk . I left some comments. We (Netronome) still need to run tests on your contribution. Unfortunately the current unit tests in the repo does not cover all the code you are touching. Furthermore we need to discuss the layout of the PV.

@aspsk
Copy link
Author

aspsk commented Dec 1, 2019

Hi @aspsk . I left some comments. We (Netronome) still need to run tests on your contribution. Unfortunately the current unit tests in the repo does not cover all the code you are touching. Furthermore we need to discuss the layout of the PV.

Hi @JohanM84 thanks for your comments. Let me know how you would like to proceed with the layout and I will update my patches accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants